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Abstract. We consider the number of vertices that must be removed 
from a graph G in order that the remaining subgraph has no component 
with more than k vertices. Our principal observation is that, if G is 
a sparse random graph or a random regular graph on n vertices with 
n — » co, then the number in question is essentially the same for all values 
of k that satisfy both k — > oo and k — o(n). 



The process of removing vertices from a graph G so that the remaining 
subgraph has only small components is known as fragmentation. Typically, 
the aim is to remove the least possible number of vertices to achieve a 
given component size; this is equivalent to determining the largest induced 
subgraph whose components are at most that size. This process has been 
studied in (at least) two different lines of research, from different perspectives 
and with quite different component sizes. In this note we point out that, as 
far as sparse random graphs are concerned, these two perspectives actually 
arrive at the same answer. 

Let r be a class of graphs. The classes we shall mostly be interested 
in are the classes Cfc, the class of graphs whose components have at most 
k vertices, and J-, the class of forests. Given such a class T, we define 

N(G,T) := max{|5| : G[S] G V}, 

where S is a subset of the vertices of G and G[S] denotes the subgraph of 
G induced by S. We also define 

v(G,T) :=N(G,T)/\G\, 

so that < u(G,T) < 1. (To make this always defined, we set N(G,T) = 
if no induced subgraph of G belongs to T; equivalently, we may regard the 
empty graph with no vertices as an element of T.) Thus, for example, the 
size of a largest independent set in G is N(G,C\) = v(G,C%)\G\. (This is 
known as the independence number.) Similarly, n — N(G,J-) is the decycling 
number, see e.g. Karp [la ]. 

In this notation, the study of fragmentation is the study of the parameter 
v(G,Ck) for various values of k. From the point of view of graph theory, 
it is natural to consider v(G,Ct) for some large but finite value of k, for 
graphs G in which the number of vertices n = \G\ grows large. This study 
was initiated by Edwards and Farr 0; 0]. On the other hand, in the study 
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of vaccination, see for example Britton, Janson and Martin-L6f [4] and the 
references therein, the vertices of the graph are individuals in some popula- 
tion, with edges representing the opportunity of passing on a disease. If a 
vertex is vaccinated it becomes unable to spread the disease; a vaccination 
strategy is a way to ensure that the subgraph induced by the unvaccinated 
vertices has only small components (relative to the total population). The 
vaccinator is thus interested in v(G,Cs n ) for small values of S. (For further 
details and for variations on this theme, see 0].) 

In both studies it is natural to consider the behaviour of these parameters 
on two standard models of sparse random graphs. Let Q(n, c/n) denote the 
probability space of graphs with vertex set {l,...,n} with edges chosen 
independently with probability c/n, and let Gd( n ) denote the space of d- 
regular graphs on the same vertex set. We shall assume that c > 1 and 
d > 3 are fixed as n — > oo. (For odd d, we of course have to assume that 
n is even.) The main observation of this paper is that, for graphs in these 
spaces, the graph theoretic approach and the vaccination approach arrive at 
the same answer: that is, perhaps surprisingly, fragmenting into large but 
finite components whp costs no more than just fragmenting into components 
of size o(|G|). (A sequence of events (A n ) is said to hold whp if Pr(^4„) — > 1 
as n — > oo.) 

Theorem 1. Let c > 1, d > 3 and e > be given. Then there exists 5 > 
such that, if G € Q(n,c/n) or G £ Qd{n) then 

u{G,C 6n ) < is(G,C 1/s )+e 

holds whp as n — > oo. 

Before giving the proof of this theorem, we make a few more remarks. 
Edwards and Farr [5|; [3] considered general graphs of bounded maximum 
degree; in particular they studied the parameter 

Pd := sup min{^ (G,Cfc) : G has maximum degree d} 

k 

(note that the of Edwards and Farr equals 1 — (3$). One way to think of 
this parameter is that, if f3 < /3d, then there is some finite k for which every 
graph G of maximum degree d has an induced subgraph G[S] with at least 
f3\G\ vertices but with no component larger than k. Trivially f3\ = 02 = 1- 
and it is shown in [5[ that = |. In general they showed that [3d > ^rj; 
a complementary inequality (3id < ^fpr was proved by Haxell, Pikhurko and 
Thomason [l3| (so answering affirmatively the question posed by Edwards 
and Farr 0] as to whether (3d — > as d — > oo). 

The parameter v(G, J 7 ), describing the largest induced forest in G, is 
significant in the study of fragmentation, because any forest F is easily 
fragmented by removing a few vertices. The following simple lemma is given 
only because it is best possible, as exemplified by a path. 

Lemma 2. If F is a forest then v{F,Ck) > 1 — (k + 
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Proof. We may assume that F is a tree, and proceed by induction on n = \F\, 
the case n < k + 1 being trivial. For larger n, note that the removal of any 
edge leaves two components. Orient the edge towards the larger of these 
(break ties arbitrarily) and colour the edge red if both components have 
more than k vertices. If there are no red edges, remove a sink vertex (there 
must be one since F is acyclic) and observe that this leaves only components 
with at most k vertices, and thus u(F,Ck) = 1 — 1/n > 1 — l/(k + 1). If 
there are red edges, it is easy to see that they form a connected subgraph 
and so a tree; the removal of a leaf vertex of the red tree breaks F into a 
tree with at most n — k — 1 vertices plus some components of size at most k, 
and the proof follows by induction. □ 

The parameter v(G,V), where V is the class of planar graphs, is likewise 
significant for fragmentation, because a planar graph P can be fragmented 



quite efficiently by means of the separator theorem of Lipton and Tarjan 17] , 
and in fact u(P, C k ) > 1 - 24Ar 1 /2 holds (see § or 181). From this and from 
Lemma[2] it follows that, for any graph G, both v{G, C k ) > v(G, J-) — {k+l)~ l 
and v(G,C k ) > v(G, V) -24A;" 1 / 2 hold. On the other hand, \in k {G) denotes 
the number of cyles in G of length at most k, then (by removing a vertex 
from each cycle) we have v(G,V) > v{G,T) > v(G,C k ) — n k (G)/\G\. Thus 
if we restrict our attention to large graphs G for which n k (G) = o{\G\) for 
each fixed k, then the three parameters u(G,J-), v(G,V) and u(G,C k ) are 
asymptotically the same for large k. Graphs in Q(n, c/n) or Qdip) enjoy this 
property whp (see 0] or [ll]). 

The parameters v(G, C\) and v{G,T) for G £ Q(n,c/n) and G € Qd{ n ) 
have already received considerable attention. The first of these (the indepen- 
dence number) was studied by Frieze [lH ] for Q(n,c/n), see also [H, Section 
7.1], and Frieze and Luczak [ll( for G G Gd( n ), an d information on the sec- 
ond is given by Bau, Wormald and Zhou [l(. In fact it is shown in [ill ] that, 
for G £ Qd{ n )i v{G,C\) ~ 2log d/d holds whp for large d, which is already 
enough to answer the above-mentioned question about the limit of (3d, and 
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131 ] it is verified that v{G, J-) ~ 2 log d/d whp. (These statements involve 
double limits, as first n — > oo and then d — > oo. More precisely, by (ill; H3|. 
for every e > 0, there exists d £ such that for Qd(n) with any fixed d > d £ , 
whp holds (2 - e) log d/d < u(G, d) < u{G, F) < (2 + e) log d/d as n -» oo. 
A similar result holds for Q(n, c/n), by [10| and a first moment argument as 
in 0.) 

We are now ready for the proof of Theorem [H 

Proof of Theorem^ We claim that the following holds whp, if 5 is small 
enough: Each set T of at most 5n vertices spans at most (1 + e/3)|T| edges. 

The theorem follows from this claim; for let S be a set such that G[S] G Cs n 
and | S | = v{G,Cs n ). By the claim, from each component G[T] of G[S] we 
may remove at most e|T|/3 edges so that it becomes acyclic or unicyclic; 
thus, by removing e|5|/3 edges we can make all components of G[S] acyclic 
or unicylic. There are at most e|<S|/3 components of size larger than 3/e, 
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and so by removing a further e\S\/3 edges we can make all these large 
components acyclic. Hence, removing vertices instead of edges, there exists 
S' C S, \S'\ > \S\ — 2en/3, such that G[S'} consists of a forest F plus 
components of size at most 3/e. By LemmaO the removal of a further e|S"|/3 
vertices from G[S'} leaves only components of size at most 3/e. Therefore 
u(G,C 3 / e ) > v(G,Csn) — £■ We can of course assume that S < e/3, and this 
proves the theorem. 

To prove the claim, consider first the case G € Q(n,c/n). Let T be a set 
of t < n5 vertices, and let r = n/t > 1/5; we can make r large by making 
5 small. Let X be the random variable counting the number of edges in 
Cr[T]. Then X is binomially distributed with mean A = ^Q) — Y~- By the 
version of the Chernoff bound in |15j, Corollary 2.4], if x = mX > A, then 
¥(X > x) < exp(-lx), where I = logm — 1 + 1/m. Taking x = (1 + e/4)i 
we have m = x/X > (1 + e/4)2r/c > 1 if 5 < 2/c, and I > logm — 1 > 
logr — 1 — logc. The number of sets of size t is (") < (er)*. Therefore the 
probability Pt that the claim fails for some set T of size t satisfies 

Pt < exp{t(l + logr) - t(l + e/4)(logr - 1 - logc)} < exp{-et(logr)/8} 

if 5 is small enough. For t > log n we have Pt < n~ 2 if 5 is small, and for 
t < logn we have logr > (logn)/2 so Pt < n _e / 16 . Thus J2i<t<Sn Pt = 
which proves the claim. 

The case G £ Qd{n) is similar. The calculation is messier but fortu- 
nately we need not give it here, because it is essentially that of the proof 



of Lemma 5.1 of Janson and Luczak [14|]. They prove (see Remark 5.2) 
that each set T with \T\ < Sn has average degree less than k, where k > 3. 
However their interest was in integer values of k, and the proof, in which k 
appears everywhere as a variable, works perfectly well for any fixed k > 2, 
which is exactly our claim. □ 

In the light of Theorem [1] it is interesting to consider the fragmentation of 
random graphs G into components of size proportional to |GL. Now, given 
c > and < x < 1, the inequalities of Azuma-Hoeffding [15l . Corollary 
2.27] or of Talagrand [lR Theorem 2.29 and Remark 2.36] show that the 
value of v(G,C xn ) for G £ Q(n,c/n) is highly concentrated. It seems very 
likely that there is a real number / c (x) such that v(G,C xn ) tends to f c {x) 
in probability; that is, for all e > 0, Vv{\v{G,C xn ) — f c (x)\ > e} — > as 
n — > oo. But it is not actually known whether f c (x) exists; this is an 
unfortunate state of affairs shared with many standard parameters such as 
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v(G,C\) and ^(G,^ 7 ). (An exception here is v{G,C\) when c < e; see 
Corollary 1].) Nevertheless, our final comments can be stated more cleanly 
by assuming both that f c (x) exists, and also that, for each fixed k, v(G,Ck) 
tends to a limit (fi c (k) in probability. Corresponding limits will be assumed 
for G G Gd{n) too, and we denote these by gd{x) and 7d(A;). (Note that, at 
least, one can show that v(G,C xn ) and v(G,Ck) are highly concentrated for 
G € Qdin) too, using a version for random permutations of the inequality 
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by Talagrand [20I . Theorem 5.1], see also McDiarmid Theorem 1.1], and 
arguing as in Haxell, Pikhurko and Thomason 

We interpret f c (x) as meaning that the largest induced subgraph of a ran- 
dom graph G £ Q{n, c/n) having no component larger than xn has (about) 
f c (x)n vertices, and the largest induced subgraph having no component 
larger than k has (p c (k)n vertices. The numbers gd{x) and 7d(fc) have sim- 
ilar interpretations. All these functions are increasing in their arguments. 
So the limits lirn^o f c {x) and lim^^o 9d{x) exist; we define / c (0) and gd(0) 
to be the values of these limits, thus making f c and g d continuous at zero. 
The content of Theorem Q] is then that 

lim cj) c (k) = / c (0) := lim f c (x) and lim j d (k) = g d (0) := lim g d (x) . 

fc^oo x— >0 fc— >oo x— >0 

[Note that \\m^ d {k) corresponds very closely to the parameter (3d defined 
earlier, the only difference being that (3d takes account of all d-regular graphs, 
whereas here we consider only almost all.] 

The function f c satisfies the Lipschitz condition f c (x) — f c (y) < ~(x — y) 
for y < x, since, if G[S] has component sizes at most xn, then at most 1/y 
of these are larger than yn, and each of these can be reduced to size yn by 
removing at most (x — y)n vertices. This, together with the fact that f c is 
increasing, shows that f c is a continuous function on the unit interval, and 
similarly so is g d . 

The famous theorem of Erdos and Renyi [9], that G £ Q(n,c/n) almost 
certainly has a unique giant component of size (1 + o(l))p(c)n, where p(c) = 
l — e~ cp ( c \ means that f c (x) = 1 for p(c) < x < 1. There is no corresponding 
fact for random regular graphs, of course; we just have ffd(l) = 1. 

It seems likely that the function f c is strictly increasing on the interval 
[0,/o(c)] and that g d is strictly increasing on [0,1]. This would mean that 
continuous inverse functions f^ 1 and g^ 1 exist. For the vaccinator, the 
function Z" 1 would be more natural than f c itself; f~ 1 (z)n is the smallest 
component size achievable by vaccinating (1 — z)n people. But we cannot 
show strict monotonicity except at the right-hand end of the range. It was 
proved by Bollobas, Janson and Riordan [3, Theorem 3.9] that for every 
e > there exists 5 > such that whp after removal of any 5n vertices from 
Q(n,c/n), there is still a giant component of order at least (p(c) — e/2)n, 
and thus f c {p{c) — e) < 1 — 5 = f(p(c)) — 5; this is an easy consequence of 
the corresponding result for edge deletions by Luczak and McDiarmid 
Lemma 2], who proved that for every e > there exists 5 > such that the 
giant component whp has no two sets, each of size at least en, having at 
most 5n edges between the sets. A similar argument can be given to show 
that g d (l - e) < g d (l) - 5. 

In conclusion, our main open question is whether f c and g d are strictly 
increasing. We would also like to know more about the subgraphs G[S] of 
order f c (x)n or g d {x)n that have no component larger than xn: how many 
components do they have? The reader who is interested in these questions 



SVANTE JANSON AND ANDREW THOMASON 



can readily formulate them in a way that does not involve the uncertain 
existence of f c and gd- 

We finally remark that corresponding questions can be formulated for 
removal of edges instead of vertices. In that case, the central parameter is 
the largest number of edges in a (not necessarily induced) subgraph of G 
that belongs to T. For T, the class of forests, this is easy (unlike the vertex 
case treated above, see Karp [161]). but we see no easy answers for e.g. C$ n 
and leave these versions as problems for the interested reader. 
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